%\section{Related and Future Work}\label{sec:future}
\section{Related and Future Work}\label{sec:future}

In contrast to the vast amount of research on sentiment and topic analysis, as well as generation tasks in which the input is artificial or pre-defined, our system implements a full end-to-end cycle from natural language analysis to natural language generation with applications in social media and automated  interaction in real-world settings.


The only two other studies on response generation in social media we know of are \newcite{Ritter:2011:DRG:2145432.2145500} and \newcite{hasegawa-EtAl:2013:ACL2013}. Ritter's and Hasegawa's approaches differ from ours in their objective and their approach to generation. Specifically, Ritter's approach is based on machine translation, creating responses by directly re-using previous content. Their data-driven approach generates relevant, but not opinionated responses.   In addition, both Ritter's and Hasegawa's systems respond to tweets, while our system analyzes and responds to complete articles. Hasegawa's approach is closer to ours in that it generates responses that are intended to elicit a specific emotion from the addressee. However, it still differs considerably in settings (dialogues versus online posting) and in the goal itself (eliciting emotion versus expressing opinion). Thus, we see these studies as complementary to ours in the realm of  response generation in social media.
 
%A natural place to begin the improvement of our response generation system is by improving the
A natural contact point of our work with other existing work in social media analysis is the investigation of how a
change in the implementation of individual components  (e.g., topic inference or sentiment scoring) %and implementing a closer integration between them,
would affect the result of the overall generation. In particular, it would be interesting to
 %develop a single component for topic+sentiment analysis  that would infer distinct sentiment levels for  topics in a topic model distribution, and see if such
 test whether a novel mechanism for joint inference of topic/sentiment distributions could lead to improvement in the  human-likeness of the generated responses.

%Moreover, our topic model prediction provides a rich form of input (probability distribution over words, according to a mix of topics), from  which we currently  isolate a top-ranked keyword that represents the most prominent topic. Clearly, the generation can refer to multiple topics, or integrate multiple keywords from a certain topic. This  would improve the grounding, and subsequently, the relevance, of the generated responses.

%The latter idea, integrating multiple keywords from a certain topic in the generation is not as trivial as may initially seen, because different keywords may belong to different part of speech categories and may be syntactically ambiguous. Initial exploration we have conducted suggests that  topic model for training and prediction integrating  part of speech tags (and possibly, named entities and base phrases) is a promising step in this direction.
%Following the development and experimentation a few lessons where identified. These can be roughly divided to technical challenges and more fundamental, theoretical insights. Both bring forth ideas and pointers for future work.

%Likewise, our sentiment analysis component  uses a general-purpose implementation that calculates a single sentiment for the entire document.

%Upon devising a  fine-grained and tightly connected implementations of topic modeling and sentiment scoring, the most interesting challenge still stands out: to effectively generate rich, human-like, responses.

The syntactic and semantic means of expression that we use  are based on bare bone templates and fine-grained POS tags \cite{Theune:2001:DSG:973927.973930}.
 These  may potentially  be expanded with different ways to express   subject/object relations, relations between phrases, polarity of sentences, and so on. Additional approaches to generation  can factor in such aspects, e.g., the template-based methods  in  \newcite{becker-2002} and \newcite{conf/aiide/NarayanIR11}, or  grammar based methods, as in \newcite{DeVault:2008:PGN:1708322.1708338}. Using more sophisticated generation methods with a rich grammatical backbone may  combat the sensitivity to computer-generated response patterns as acquired by our human raters over time.
%In order to change the familiarity of the overall response structure imposed by the template, we suggest to use a  grammar-based mechanism that prescribes %Natural language is usually rich in scope. One of the challenge that we
%encountered was difficulty to generate rich sentences. We were limited to very
%specific response structure with limited way to express the subject / object and
%meaning. Initially we believed that the topic models would allow us to be more
%flexible in the response but that proved to be more difficult than expected as
%there is little semantic knowledge in a topic model.

Furthermore, our result concerning the human-likeness of \(g_{\rm kb}\) clearly demonstrates that semantic knowledge must be brought in to support better, and more human-like,
response generation.
%In particular, building a categorized topic model (based on, e.g.,  a part-of-speech tagged datasets) will allow widening the scope of
%the generated response by identifying topics as `entities'  (nouns) and `actions' (verbs). This, along  with
Large-scale knowledge graphs such as Freebase %\footnote{\url{http://www.freebase.com}}
support many semantic tasks \cite{Jacobs:1985:KAL:894296}, and   can  be used for  providing richer context  for automatically generating human-like responses.

%Having a more complex knowledge system then calls for a more refined sentiment analysis. While our implementation used a very rough sentiment analysis tool, a finer analysis can result in a more precise responses. Instead of WSD it is possible to train a sentiment classifier on the subject matter and thus get a better response; alternatively, use WSD but with modifications to account for negations, and maybe include additional features, such as n-grams, in the scoring. Furthermore, we applied sentiment analysis on the whole document. Analyzing at finer granularity can result in a response that addresses several aspect of the text and thus, we believe, more effective.

%Thirdly,  template-based generation oftentimes requires relatively tedious work of devising the templates and aggregating the functions. This process makes   it difficult to achieve richly structured responses --- for instance,  it is hard to synchronize the agreement between the various parts of the sentence, which thwarts the use of additional rhetorical constructions such as interrogative types, tenses and person/voice (e.g., passive) changes and phrase extra-posing. We plan to address these challenges using more robust template-based systems such as those by \newcite{becker-2002} and \newcite{conf/aiide/NarayanIR11} or grammar-based systems.

From a theoretical viewpoint, the system  will clearly benefit from  rigorous analysis of human interaction in
online media. Responses to user-generated content on the Internet share some linguistic characteristics in structure, length and manner of expression. Studying these features theoretically and then examining them empirically using a Turing-like evaluation  as presented here can take us a big step in the direction of better generation, and also better understanding of the  processes underlying human response generation.

This latter understanding may  be complemented with insights into the causes, motivations and
intricacies of human interaction in such environments, as studied by sociologists and psychologists. %This makes this research truly interdisciplinary.
%Collaboration with communication scientists, sociologists and psychologists that study interactivity and social media would therefore be a great asset.
 In particular,
our preliminary interaction with colleagues from communication studies suggests that the present
endeavor nicely complements that of ``persuasive computing'' \cite{Fogg:1998:PCP:274644.274677,Fogg:2002:PTU:764008.763957}, and we hope that this collaboration will lead to valuable synergies. % between the two threads of research.
%Actually, initial exploration of practice and theory from the communications fields already present some new approaches for future exploration.

Finally, bridging the gap between the technical and the theoretical, it
would be fascinating to test the responses in the context for which they are
generated -- social media.
%While our methodology provides us with a good way of scrutinizing factors affecting human-likeness, we believe that  a real-world task-based evaluation would take us closer to our ultimate goal ---
Generated texts may be posted as a response to the original article, or shared with a link of the original article, followed by measuring the responses to, and shares of, that response. Such real-world evaluation could indicate that generated responses are  indeed  believable and engaging, and may better simulate a Turing-like test in which machine-generated responses  cannot be distinguished from human responses.

